On Constraints on the Search Path of Policy Iteration

نویسنده

  • Omid Madani
چکیده

We describe a few structural properties enjoyed by the policy space of problems such as in nite-horizon MDPs. From these properties we derive constraints limiting the number of iterations of algorithms such as the policy iteration algorithm for in nite-horizon MDPs and the Ho man-Karp algorithm for simple stochastic games. An open problem is to characterize the growth of the worst-case number of iterations of these algorithms subject to the derived constraints.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corrector-predictor arc-search interior-point algorithm for $P_*(kappa)$-LCP acting in a wide neighborhood of the central path

In this paper, we propose an arc-search corrector-predictor interior-point method for solving $P_*(kappa)$-linear complementarity problems. The proposed algorithm searches the optimizers along an ellipse that is an approximation of the central path. The algorithm generates a sequence of iterates in the wide neighborhood of central path introduced by Ai and Zhang. The algorithm does not de...

متن کامل

A path-following infeasible interior-point algorithm for semidefinite programming

We present a new algorithm obtained by changing the search directions in the algorithm given in [8]. This algorithm is based on a new technique for finding the search direction and the strategy of the central path. At each iteration, we use only the full Nesterov-Todd (NT)step. Moreover, we obtain the currently best known iteration bound for the infeasible interior-point algorithms with full NT...

متن کامل

The optimal search for a Markovian target when the search path is constrained: the infinite-horizon case

A target moves among a finite number of cells according to a discrete-time homogeneous Markov chain. The searcher is subject to constraints on the search path, i.e., the cells available for search in the current epoch is a function of the cell searched in the previous epoch. The aim is to identify a search policy that maximizes the infinite-horizon total expected reward earned. We show the foll...

متن کامل

Providing an algorithm for solving general optimization problems based on Domino theory

Optimization is a very important process in engineering. Engineers can create better production only if they make use of optimization tools in reduction of its costs including consumption time. Many of the engineering real-word problems are of course non-solvable mathematically (by mathematical programming solvers). Therefore, meta-heuristic optimization algorithms are needed to solve these pro...

متن کامل

An Efficient Method for Selecting a Reliable Path under Uncertainty Conditions

In a network that has the potential to block some paths, choosing a reliable path, so that its survival probability is high, is an important and practical issue. The importance of this issue is very considerable in critical situations such as natural disasters, floods and earthquakes. In the case of the reliable path, survival or blocking of each arc on a network in critical situations is an un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999